Skip to content

Feat/attafosu/sglang OpenAI api compatibility#156

Closed
attafosu wants to merge 5 commits intomainfrom
feat/attafosu/sglang-openai-api-compatibility
Closed

Feat/attafosu/sglang OpenAI api compatibility#156
attafosu wants to merge 5 commits intomainfrom
feat/attafosu/sglang-openai-api-compatibility

Conversation

@attafosu
Copy link
Copy Markdown
Collaborator

@attafosu attafosu commented Mar 9, 2026

What does this PR do?

Adds unified dataset preset featuring openai-compatible and native sglang api for cnn dailymail.

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

attafosu added 2 commits March 4, 2026 11:35
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Committer: attafosu <thomas.atta-fosu@intel.com>
@attafosu attafosu requested a review from a team as a code owner March 9, 2026 20:15
Copilot AI review requested due to automatic review settings March 9, 2026 20:15
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 9, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions bot requested review from arekay-nv and nvzhihanj March 9, 2026 20:15
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature that expands the dataset management capabilities by adding a llama3_8b_sglang preset for the CNN Dailymail dataset. This preset facilitates compatibility with both OpenAI and native SGLang APIs, streamlining the process of preparing data for inference. A key part of this enhancement is the modification of the Harmonize data transform, which now supports a "plain" tokenization mode, offering greater flexibility in prompt processing. Additionally, minor adjustments were made to OpenAI API type definitions to improve robustness.

Highlights

  • New Dataset Preset: Introduced a new llama3_8b_sglang preset for the CNN Dailymail dataset, supporting both OpenAI-compatible and native SGLang API usage.
  • Enhanced Harmonize Transform: Enhanced the Harmonize data transformation to include a "plain" mode, allowing for direct tokenization of raw prompts without conversation harmonization.
  • OpenAI API Type Updates: Updated OpenAI API type definitions to explicitly mark refusal, usage, and system_fingerprint fields as optional with default None values.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py
    • Imported the Harmonize transform.
    • Added a new function llama3_8b_sglang which defines a list of transforms for prompt formatting and plain tokenization using Harmonize.
  • src/inference_endpoint/dataset_manager/transforms.py
    • Modified the Harmonize class constructor to accept a mode parameter ("harmony" or "plain").
    • Added validation for the mode parameter.
    • Updated the process_row method to conditionally tokenize prompts based on the mode and to guard against overwriting existing tokenized columns.
  • src/inference_endpoint/openai/types.py
    • Set default None values for the refusal field in ChatCompletionResponseMessage.
    • Set default None values for the usage and system_fingerprint fields in ChatCompletionResponse.
Activity
  • Pull request created by attafosu.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new llama3_8b_sglang preset for the CNN/DailyMail dataset, enhancing compatibility with the SGLang API. It also refines the Harmonize transform by adding a mode parameter and corresponding logic to handle both 'harmony' and 'plain' tokenization, which is crucial for flexible prompt processing. Additionally, optional fields in OpenAI API types (refusal, usage, system_fingerprint) are now explicitly set with None as a default, improving clarity and consistency. The changes are well-structured and contribute positively to the codebase.

Comment on lines +55 to +59
stream: bool = True,
max_new_tokens: int = 128,
temperature: float = 0.0,
top_p: float = 1.0,
top_k: int = 1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameters stream, temperature, top_p, and top_k are defined in the llama3_8b_sglang function signature but are not used within the function body. This indicates dead code, which can be misleading and suggests that these parameters might be intended for future use or were overlooked. If these parameters are not meant to be used, they should be removed. If they are intended to be used, their functionality should be implemented.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an SGLang-focused CNN/DailyMail preset and improves OpenAI msgspec compatibility by making optional response fields truly optional during struct construction/decoding.

Changes:

  • Make msgspec OpenAI response fields (refusal, usage, system_fingerprint) default to None for better OpenAI-compat decoding/serialization.
  • Extend Harmonize with a mode option (harmony vs plain) and add a per-row guard to avoid overwriting pre-tokenized rows in fused row-processor pipelines.
  • Add a new llama3_8b_sglang CNN/DailyMail preset that formats prompts and produces input_tokens via Harmonize(mode="plain").

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/inference_endpoint/openai/types.py Makes msgspec OpenAI response structs tolerant of omitted optional fields.
src/inference_endpoint/dataset_manager/transforms.py Adds Harmonize plain-tokenization mode and overwrite-prevention in fused pipelines.
src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py Adds an SGLang-oriented Llama 3.1 8B preset producing input_tokens.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +185 to +186
if self.tokenized_column in row and row[self.tokenized_column] is not None:
return row
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-tokenized guard treats any non-None value as “already tokenized”. In pandas rows, missing values are often NaN (which is not None), so this would incorrectly skip tokenization and leave NaN in input_tokens, likely breaking downstream code that expects a list of token IDs. Consider using an explicit null check that treats NaN as missing (e.g., via pd.isna) before returning early.

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +156
self.mode = mode
if self.mode not in {"harmony", "plain"}:
raise ValueError(f"Invalid harmonize mode: {self.mode}")
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harmonize.__call__ still skips purely based on the presence of tokenized_column in df.columns, but process_row now skips only when the per-row value is non-null. This makes behavior differ depending on whether row processors are fused (or if fuse_row_processors=False is used). Consider aligning the dataframe-level skip logic with the row-level guard so the transform behaves consistently.

Copilot uses AI. Check for mistakes.
Comment on lines 176 to +191
Returns:
Row dictionary with the harmonized prompt added
"""
row[self.tokenized_column] = self.harmonizer(row[self.prompt_column])
# Guard pre-tokenized rows: the SGLang adapter adds a default Harmonize
# (GPT-OSS tokenizer + harmony mode). When row processors are fused, the
# dataframe-level skip is bypassed, so without this guard, adapter
# Harmonize would overwrite input tokens. Alternative: remove Harmonize
# from the adapter transforms and require each SGLang preset to add its
# own Harmonize with the desired tokenizer/args.
if self.tokenized_column in row and row[self.tokenized_column] is not None:
return row
if self.mode == "plain":
tokens = self.harmonizer.to_tokens(row[self.prompt_column])
row[self.tokenized_column] = tokens
else:
row[self.tokenized_column] = self.harmonizer(row[self.prompt_column])
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change adds new Harmonize behavior (mode plus the overwrite-prevention guard when row processors are fused), but tests/unit/dataset_manager/test_transforms.py explicitly excludes Harmonize. Please add unit tests that cover (1) mode="plain" vs mode="harmony", and (2) fused pipelines where a second Harmonize should not overwrite existing input_tokens.

Copilot uses AI. Check for mistakes.
Comment on lines 83 to +113
@@ -109,5 +109,5 @@ class ChatCompletionResponse(msgspec.Struct, kw_only=True, omit_defaults=True):
created: int
model: str
choices: list[ChatCompletionChoice]
usage: CompletionUsage | None
system_fingerprint: str | None
usage: CompletionUsage | None = None
system_fingerprint: str | None = None
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no unit tests covering the msgspec OpenAI types / msgspec adapter decode path. Since these fields now default to None to support responses that omit them, it would be good to add a test that decodes a minimal OpenAI-compatible response missing refusal, usage, and system_fingerprint and asserts decoding succeeds and fields are None.

Copilot uses AI. Check for mistakes.
Comment on lines 157 to 159
self.harmonizer = Harmonizer(
tokenizer_name=tokenizer_name,
encoding_name=encoding_name,
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In mode="plain", process_row only calls self.harmonizer.to_tokens(...), but Harmonizer.__init__ still loads the Harmony encoding and constructs Harmony system content. That’s potentially expensive and unnecessary for plain tokenization. Consider a lightweight path for plain mode (e.g., defer encoding load until __call__ is used, or use the underlying tokenizer directly) to reduce init overhead.

Copilot uses AI. Check for mistakes.
attafosu and others added 3 commits March 9, 2026 13:42
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
* Handle case with string response

Handles the case where the response is a single string, not a list - needed to handle AMD submission which wasn't calculating TPOT without the fix.
---------

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@attafosu attafosu force-pushed the feat/attafosu/sglang-openai-api-compatibility branch from ff66399 to 4dd91ee Compare March 9, 2026 20:52
@attafosu attafosu closed this Mar 9, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants